Model Selection

Multimodal Large Language Model Visual Backbone

# Multimodal Large Language Model Visual Backbone

Mlcd Vit Large Patch14 336

A visual feature extraction model based on ViT-L/14@336px architecture, surpassing CLIP benchmarks in multiple multimodal tasks

Multimodal Fusion

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase